Selective Bayesian classifier: feature selection for the Naive Bayesian classifier using decision trees
نویسندگان
چکیده
It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on some. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Nafve Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on eleven datasets indicate that SBC performs reliably better than NB on all domains, and SBC outperforms C4.5 on many datasets of which C4.5 outperform NB. SBC also can eliminate, on most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data, as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach high accuracy of classification.
منابع مشابه
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملFeature Selection for the Naive Bayesian Classifier Using Decision Trees
It is known that Naïve Bayesian classifier (NB) works very well on some domains, and poorly on some. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naïve Bayesian a lgorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4....
متن کاملHealthcare data mining : predicting inpatient length of stay
Data mining approaches have been widely applied in the field of healthcare. At the same time it is recognized that most healthcare datasets are full of missing values. In this paper we apply decision trees, Naive Bayesian classifiers and feature selection methods to a geriatric hospital dataset in order to predict inpatient length of stay, especially for the long stay patients. Index Terms NBI,...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملThe Influences of Number and Nature of Classifiers on Consensus Feature Selection
Wrapper feature selection approaches are widely used to select a small subset of relevant features from a dataset. However, Wrappers suffer from the fact that they only use a single classifier when selecting the features. The downside to this approach is that each classifier will have its own biases and will therefore select very different features. In order to overcome the biases of individual...
متن کامل